PIPES: A Multi-Threaded Publish-Subscribe Architecture for Continuous Queries over Streaming Data Sources
نویسندگان
چکیده
In contrast to traditional query processing based on persistent data, new application scenarios arise that heavily rely on the continuous evaluation of data streams. These streams often emerge from autonomous data sources. In this paper, we present the novel publish-subscribe architecture PIPES (Public Infrastructure for Processing and Exploring Streams), which easily allows the composition of complex query graphs with an inherently dynamic resource sharing, even during runtime. We sketch essential design and implementation considerations of PIPES, which contains a generic operator framework for streaming data. PIPES is completely implemented and integrated in XXL, a flexible and extensible Java library for data processing. In contrast to related system prototypes, whose communication between operators solely depends on queues, we permit direct interoperability. In our new hybrid scheduling approach for PIPES, we intend to dynamically vary the number of concurrently running lightweight processes, called threads. First experimental studies show its advantages of neither being restricted to a single thread nor assigning a separate thread per operator.
منابع مشابه
Optimizing large collections of continuous content-based RSS aggregation queries
In this article we present RoSeS (Really Open Simple and Efficient Syndication), a generic framework for content-based RSS feed querying and aggregation. RoSeS is based on a data-centric approach, using a combination of standard database concepts like declarative query languages, views and multi-query optimization. Users create personalized feeds by defining and composing content-based filterin...
متن کاملRoSeS: A Continuous Content-Based Query Engine for RSS Feeds
In this article we present RoSeS (Really Open Simple and Efficient Syndication), a generic framework for content-based RSS feed querying and aggregation. RoSeS is based on a data-centric approach, using a combination of standard database concepts like declarative query languages, views and multiquery optimization. Users create personalized feeds by defining and composing content-based filtering...
متن کاملHigh-throughput Publish/Subscribe on top of LSM-based Storage
State-of-the-art publish/subscribe systems are efficient when the subscriptions are relatively static – for instance, the set of followers in Twitter – or can fit in memory. However, now-a-days, many Big Data and IoT based applications follow a highly dynamic query paradigm, where both continuous queries and data entries are in the millions and can arrive and expire rapidly. In this paper we pr...
متن کاملTop-k/w publish/subscribe: A publish/subscribe model for continuous top-k processing over data streams
Continuous processing of top-k queries over data streams is a promising technique for alleviating the information overload problem as it distinguishes relevant from irrelevant data stream objects with respect to a given scoring function over time. Thus it enables filtering of irrelevant data objects and delivery of top-k objects relevant to user interests in real-time. We propose a solution for...
متن کاملPublish/Subscribe with RDF Data over Large Structured Overlay Networks
We study the problem of evaluating RDF queries over structured overlay networks. We consider the publish/subscribe scenario where nodes subscribe with long-standing queries and receive notifications whenever triples matching their queries are inserted in the network. In this paper we focus on conjunctive multi-predicate queries. We demonstrate that these queries are useful in various modern app...
متن کامل